High-throughput protein analysis integrating bioinformatics and experimental assays.
نویسندگان
چکیده
The wealth of transcript information that has been made publicly available in recent years requires the development of high-throughput functional genomics and proteomics approaches for its analysis. Such approaches need suitable data integration procedures and a high level of automation in order to gain maximum benefit from the results generated. We have designed an automatic pipeline to analyse annotated open reading frames (ORFs) stemming from full-length cDNAs produced mainly by the German cDNA Consortium. The ORFs are cloned into expression vectors for use in large-scale assays such as the determination of subcellular protein localization or kinase reaction specificity. Additionally, all identified ORFs undergo exhaustive bioinformatic analysis such as similarity searches, protein domain architecture determination and prediction of physicochemical characteristics and secondary structure, using a wide variety of bioinformatic methods in combination with the most up-to-date public databases (e.g. PRINTS, BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimental results and from the bioinformatic analysis are integrated and stored in a relational database (MS SQL-Server), which makes it possible for researchers to find answers to biological questions easily, thereby speeding up the selection of targets for further analysis. The designed pipeline constitutes a new automatic approach to obtaining and administrating relevant biological data from high-throughput investigations of cDNAs in order to systematically identify and characterize novel genes, as well as to comprehensively describe the function of the encoded proteins.
منابع مشابه
Biochemoinformatics : Integrating Bioinformatics and Chemoinformatics for Drug Discovery and Development
A unique characteristic of the post-genome drug discovery and development process is the accessibility of a wide range of highly parallel and high-throughput experimental tools that are generating huge amounts of diverse data, including sequence data, chemical structure information, biological activity patterns and global expression profiles in mRNA, protein and other molecular levels. Integrat...
متن کاملGenome-Scale Protein Function Prediction in Yeast Saccharomyces cerevisiae Through Integrating Multiple Sources of High-Throughput Data
As we are moving into the post genome-sequencing era, various high-throughput experimental techniques have been developed to characterize biological systems at the genome scale. Discovering new biological knowledge from high-throughput biological data is a major challenge for bioinformatics today. To address this challenge, we developed a Bayesian statistical method together with Boltzmann mach...
متن کاملCellMissy: a tool for management, storage and analysis of cell migration data produced in wound healing-like assays
SUMMARY Automated image processing has allowed cell migration research to evolve to a high-throughput research field. As a consequence, there is now an unmet need for data management in this domain. The absence of a generic management system for the quantitative data generated in cell migration assays results in each dataset being treated in isolation, making data comparison across experiments ...
متن کاملA statistical framework for combining and interpreting proteomic datasets
MOTIVATION To identify accurately protein function on a proteome-wide scale requires integrating data within and between high-throughput experiments. High-throughput proteomic datasets often have high rates of errors and thus yield incomplete and contradictory information. In this study, we develop a simple statistical framework using Bayes' law to interpret such data and combine information fr...
متن کاملSAMNetWeb: identifying condition-specific networks linking signaling and transcription
MOTIVATION High-throughput datasets such as genetic screens, mRNA expression assays and global phospho-proteomic experiments are often difficult to interpret due to inherent noise in each experimental system. Computational tools have improved interpretation of these datasets by enabling the identification of biological processes and pathways that are most likely to explain the measured results....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Nucleic acids research
دوره 32 2 شماره
صفحات -
تاریخ انتشار 2004